Search CORE

13 research outputs found

Toward an Understanding of Software Code Cloning as a Development Practice

Author: Kapser Cory
Publication venue: 'University of Waterloo'
Publication date: 18/09/2009
Field of study

Code cloning is the practice of duplicating existing source code for use elsewhere within a software system. Within the research community, conventional wisdom has asserted that code cloning is generally a bad practice, and that code clones should be removed or refactored where possible. While there is significant anecdotal evidence that code cloning can lead to a variety of maintenance headaches --- such as code bloat, duplication of bugs, and inconsistent bug fixing --- there has been little empirical study on the frequency, severity, and costs of code cloning with respect to software maintenance. This dissertation seeks to improve our understanding of code cloning as a common development practice through the study of several widely adopted, medium-sized open source software systems. We have explored the motivations behind the use of code cloning as a development practice by addressing several fundamental questions: For what reasons do developers choose to clone code? Are there distinct identifiable patterns of cloning? What are the possible short- and long-term term risks of cloning? What management strategies are appropriate for the maintenance and evolution of clones? When is the ``cure'' (refactoring) likely to cause more harm than the ``disease'' (cloning)? There are three major research contributions of this dissertation. First, we propose a set of requirements for an effective clone analysis tool based on our experiences in clone analysis of large software systems. These requirements are demonstrated in an example implementation which we used to perform the case studies prior to and included in this thesis. Second, we present an annotated catalogue of common code cloning patterns that we observed in our studies. Third, we present an empirical study of the relative frequencies and likely harmfulness of instances of these cloning patterns as observed in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In summary, it appears that code cloning is often used as a principled engineering technique for a variety of reasons, and that as many as 71% of the clones in our study could be considered to have a positive impact on the maintainability of the software system. These results suggest that the conventional wisdom that code clones are generally harmful to the quality of a software system has been proven wrong

University of Waterloo's Institutional Repository

Subjectivity in Clone Judgment: Can We Ever Agree?

Author: Anderson Paul
Godfrey Michael
Kapser Cory
Koschke Rainer
Rieger Matthias
van Rysselberghe Filip
Publication venue: Dagstuhl Seminar Proceedings. 06301 - Duplication, Redundancy, and Similarity in Software
Publication date: 01/01/2007
Field of study

An objective definition of what a code clone is currently eludes the field. A small study was performed at an international workshop to elicit judgments and discussions from world experts regarding what characteristics define a code clone. Less than half of the clone candidates judged had 80% agreement amongst the judges. Judges appeared to differ primarily in their criteria for judgment rather than their interpretation of the clone candidates. In subsequent open discussion the judges provided several reasons for their judgments. The study casts additional doubt on the reliability of experimental results in the field when the full criterion for clone judgment is not spelled out

Dagstuhl Research Online Publication Server

Requirements specifications and recovered architectures as grounded theories

Author: Berry Daniel M.
Godfrey Michael W.
Holt Ric
Kapser Cory J.
Ramos Isabel
Publication venue: Sociology Press
Publication date: 01/01/2013
Field of study

This paper describes the classic grounded theory (GT) process as a method to discover GTs to be subjected to later empirical validation. The paper shows that a well conducted instance of requirements engineering or of architecture recovery resembles an instance of the GT process for the purpose of discovering the requirements specification or recovered architecture artifact that the requirements engineering or architecture recovery produces. Therefore, this artifact resembles a GT

Universidade do Minho: RepositoriUM

Directory of Open Access Journals

Toward a Taxonomy of Clones in Source Code: A Case Study

Author: Cory Kapser
Michael W. Godfrey
Publication venue
Publication date
Field of study

Code cloning --- that is, the gratuitous duplication of source code within a software system --- is an endemic problem in large, industrial systems [9, 7]. While there has been much research into techniques for clone detection and analysis, there has been relatively little empirical study on characterizing how, where, and why clones occur in industrial software systems. In this paper, we present a preliminary categorization scheme for code clones, and we discuss how we have applied this taxonomy in a case study performed on the file system subsystem of the Linux operating system. Our case study yielded many surprising results, including that cloning is rampant both within particular file system implementations and across different ones, and that as many as 13% of the 4407 functions that are more than six lines long were involved in a clone-pair relationship

CiteSeerX

Aiding Comprehension of Cloning Through Categorization

Author: Cory Kapser
Michael W. Godfrey
Publication venue
Publication date
Field of study

Management of duplicated code in software systems is important in ensuring its graceful evolution. Commonly clone detection tools return large numbers of detected clones with little or no information about them, making clone management impractical and unscalable. We have used a taxonomy of clones to augment current clone detection tools in order to increase the user comprehension of duplication of code within software systems and filter false positives from the clone set. We support our arguments by means of 2 case studies, where we found that as much as 53% of clones can be grouped to form Function clones or Partial Function clones and we were able to filter out as many as 65% of clones as false positives from the reported clone pairs

CiteSeerX

Clone Detection: How accurate is your data set?

Author: Cory J. Kapser
Michael W. Godfrey
Publication venue
Publication date
Field of study

Duplication of code in software systems is considered to be a serious problem that can affect a systems maintainability and extendability. It is reported that 10-15% of code in a software system is involved in cloning. However, because of the difficultly of objectively measuring the number of false positives in a clone result set, the accuracy of these reports is difficult to evaluate. Although an important topic, little work has been done in the area of evaluating the accuracy of clone detection methods. In this paper we propose a study to estimate the number of false positives that are likely to be in a data set in an objective way by measuring the number of clones found in a large body of unrelated code. We also propose a method to measure the impact of external factors such as programing idioms and API protocols on the detected results set. The results of this work will provide tools and knowledge to better evaluate the current state of the art of clone detection research

CiteSeerX

Noname manuscript No. (will be inserted by the editor)

Author: Cory J. Kapser
Michael W. Godfrey
Publication venue
Publication date
Field of study

the date of receipt and acceptance should be inserted later Abstract Literature on the topic of code cloning often asserts that duplicating code within a software system is a bad practice, that it causes harm to the system’s design and should be avoided. However, in our studies, we have found significant evidence that cloning is often used in a variety of ways as a principled engineering tool. For example, one way to evaluate possible new features for a system is to clone the affected subsystems and introduce the new features there, in a kind of sandbox testbed. As features mature and become stable within the experimental subsystems, they can be migrated incrementally into the stable code base; in this way, the risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have observed in our case studies and discusses the advantages and disadvantages associated with using them. We also examine through a case study the frequencies of these clones in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In this study, we found that as many as 71 % of the clones could be considered to have a positive impact on the maintainability of the software system.

CiteSeerX

Cloning by Accident: An Empirical Study of Source Code Cloning across Software Systems

Author: Cory Kapser
Michael Godfrey
Raihan Al-Ekram
Richard Holt
Publication venue
Publication date
Field of study

One of the key goals of open source development is the sharing of knowledge, experience, and solutions that pertain to a software system and its problem domain. Source code cloning is one way in which expertise can be reused across systems; cloning is known to have been used in several open source projects, such as the SCSI drivers of the Linux kernel [16] . In this paper, we discuss two case studies in which we performed clone detection and analysis on several open source systems within the same domain: we examined nine text editors written in C, and eight X-Windows window managers written in C and C++. To our surprise, we found little evidence of "true" cloning activity, but we did notice a significant number of "accidental" clones --- that is, code fragments that are similar due to the precise protocols they must use when interacting with a given API or set of libraries. We further discuss the nature of "true" versus "accidental" clones, as well as the details of our case studies

CiteSeerX

Four Interesting Ways in Which History Can Teach Us About Software

Author: Cory Kapser
Lijie Zou
Michael Godfrey
Michael Godfrey Xinyi
Xinyi Dong
Publication venue
Publication date: 01/01/2004
Field of study

In this position paper, we outline four kinds of studies that we have undertaken in trying to understand various aspects of a software system's evolutionary history. In each instance, the studies have involved detailed examination of real software systems based on "facts" extracted from various kinds of source artifact repositories, as well as the development of accompanying tools to aid in the extraction, abstraction, and comprehension processes. We briefly discuss the goals, results, and methodology of each approach

CiteSeerX